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Abstract — The mean occupancy rates of personal vehicle trips 
in the United States is only 1.6 persons per vehicle mile. Urban 
traffic gridlock is a familiar scene. Ridesharing has the potential 
to solve many environmental, congestion, and energy problems. 
In this paper, we introduce the problem of large scale real-time 
ridesharing with service guarantee on road networks. Servers and 
trip requests are dynamically matched while waiting time and 
service time constraints of trips are satisfied. We first propose two 
basic algorithms: a branch-and-bound algorithm and an integer 
programing algorithm. However, these algorithm structures do 
not adapt well to the dynamic nature of the ridesharing problem. 
Thus, we then propose a kinetic tree algorithm capable of better 
scheduling dynamic requests and adjusting routes on-the-fly. We 
perform experiments on a large real taxi dataset from Shanghai. 
The results show that the kinetic tree algorithm is faster than 
other algorithms in response time. 

I. Introduction 

Despite our struggle with energy, pollution, and congestion, 
many private and public vehicles continue to travel with empty 
seats. The mean occupancy rate of personal vehicle trips in 
the United States is only 1.6 persons per vehicle mile [1]. In 
1999, if 4% of drivers would rideshare it would have offset the 
increase in congestion in the 68 urban areas completely that 
year [2]. Several large cities begin to encourage taxi sharing. 

Real-time ridesharing [3], [4], [5], enabled by low cost 
geo-locating devices, smartphones, wireless networks, and 
social networks, is a service that dynamically arranges ad-hoc 
shared rides. In a real-time ridesharing with service guarantees 
on road networks problem (hereafter referred to simply as 
ridesharing), a set of servers travel over a road network, 
cruising when not committed to any service and delivering 
passengers otherwise. Requests for rides are received over 
time, each consisting of two points, a source and a destination. 
Each request also specifies two constraints, a waiting time 
defining the latest time to be picked up and a service constraint 
defining the acceptable extra detour time from the shortest 
possible trip duration. When a new request is received, it is 
evaluated immediately by all servers. In order to be assigned 
the request, a server must satisfy all constraints, both those 



of the new request and those of requests already assigned to 
the server. The goal is to schedule requests in real-time and 
minimize the servers' traveling times to complete all of the 
committed service while meeting service quality guarantee. 

The traditional dial-a-ride problem [6] aims at designing 
vehicle routes and schedules for small to middle sized trip 
and vehicle sets, e.g. a few vehicles serving tens of requests. 
Large scale private car sharing and real-time on-demand taxi 
or cab sharing are becoming increasingly popular. Increasing 
numbers of users use mobile devices or the Internet to request 
and participate in these ride-sharing services. Tickengo [3], 
founded in 2011, is an open ride system where over 50,000 
people participate in ridesharing. Other companies include 
Avego, PickupPal, Zimride, and Zebigo. In an urban city like 
Shanghai, there are approximately 120,000 road intersections, 
40,000 taxis, and more than 400,000 taxi trips per day (these 
numbers are derived from our experimental dataset). Slight 
change of weather such as light rain will send the city into a 
gridlock. With the mounting energy, pollution, and congestion 
problems in the urban and metropolitan areas that are growing 
at tremendous rates and already host more than half of the 
entire human population, trading a small amount of privacy 
and convenience for energy and cost savings using ridesharing 
is promising and maybe inevitable. 

However, providing ridesharing service at the urban scale 
is a non-trivial problem. The core is to devise a real-time 
matching algorithm that can quickly determine the best vehicle 
(taxi, cab, bus) to satisfy incoming service requests. Traditional 
solutions to the related dial-a-ride problems using branch-and- 
bound [7] or mixed integer programing [8] approaches are 
not designed to deal with these enormous modern situations. 
Furthermore, most previous solutions focus on scenarios where 
requests are known ahead of time and servers originate and fin- 
ish at known depots. The dynamic and en route nature renders 
many of these algorithms either inapplicable or inefficient. 

In this paper, we focus on developing fast matching algo- 
rithms for large scale real-time ridesharing. Our algorithms 
are applicable to the existing services including taxi services. 



private vehicle sharing, elevator systems, minibus services, 
and courier services. We note that there are other important 
factors which need to be considered for the emerging large 
scale real-time ridesharing, such as inter-personal (female vs. 
male), safety, social discomfort, and pricing concerns. Those 
can be addressed by real-name profiling, reputation, or social 
network trust building systems [4], [1] and beyond the scope 
of this paper. 

A. Problem Definition 

A road network G =< E^W > consists of vertex set V 
and edge set E. Each edge {u^v) e E (u^v e V) is associated 
with a weight W{u^ v) which indicates the traveling cost along 
the edge {u^v); this traveling cost can be either a time measure 
or a distance measure. Assuming driving speeds are available, 
time measure and distance measure can often be converted 
from one to another and are used interchangeably. 

Given two nodes s and e in the road network, a path p 
between them is a vertex sequence {vq^vi^--- where 
(vi^Vi-^i) is an edge in E, vq = s, and Vk = e. The path 
cost W{p) = ^W{vi^Vi-^i) is the sum of each edge cost 
W{vi^Vi-^i) along the path. The shortest path cost d{s^e) is 
defined as the minimal cost for paths linking from s to e, i.e., 
d{s,e) = minp W{p) and the corresponding path with cost 
d{s,e) is the shortest path from s to e. 

Definition 1: (Trip Request) A trip tr =< s^e^w^e > with 
respect to a road network G =< V^E^W > is defined by a 
source s e V, a. destination e e V, a. maximal waiting time 
w (defines the maximum time allowed between making the 
request and receiving the service), and a service constraint 
e for the extra detour time in a trip (bounding the overall 
distance from 5 to e by (1 + e)d{s, e)). 

We consider a unified waiting time w and service constraint 
e for all requests, which can be specified by the service 
provider. However, our proposed algorithms can be easily gen- 
eralized to individualized waiting time and service constraints. 
We further assume that G is static over time (i.e., we do not 
consider different path costs at different times of the day), 
but the algorithms we present can handle the case where G 
changes under a predetermined course, and can be extended 
to the case where G changes unpredictable (for example, to 
simulate dynamic traffic conditions). 

To deal with real-time ride sharing, for each trip tri =< 
Si^Ci^w^e > and a given server (e.g., a taxi, cab, or private 
vehicle), we further introduce r^, the server's location when 
the request is made. Given this, a general trip schedule for 
a server with m trips can be described in a sequence with 
3m elements, (xi, X2, • • • , xsrn), where an element Xj in the 
sequence is either a trip source (Si), a trip destination (e^), 
or trip request point (r^). Furthermore, a server is assumed to 
travel along the shortest path in the road network when moving 
between any two consecutive points in the trip schedule Xi and 
Xi-^i. Thus, the trip cost between any two points (xi^Xj) in 
the trip schedule drixi^Xj) is denoted as 

drixi^Xj) = d{xi,Xi^i) + d{xi^i^i^2) H h d{xj-i,Xj). 




(tl,ll) (t2,l2) (t3,l3) 

Fig. 1. Trip Schedule, s^: trip starting point; e^: trip ending point; ri server 
location when request of trip tri comes in; (t, I) is the current time and 
location of the server. 

The overall trip cost is simply drixi^xsm)- 

Figure [T] illustrates a trip schedule for four trip requests. 
Note that since each moving server is associated with a trip 
schedule at any give time, we associate two variables (t, /) 
with a trip schedule to facilitate our discussion, where t is 
the current time and / is the current location of the server. 
Intuitively, if a trip schedule is being executed by a server, 
(t,/) will move along the trip schedule. Note that, at any 
given time t, each server is associated with a subset of active 
trips, the trips whose requests have been accepted (with some 
picked up and some not) but not yet dropped off. For instance, 
in Figure [T] the active trips are {tri, tr2, trs} at time ti; 
{tri, tr2, tra, tr4} at time ^2; and {tri^tr2^tr/^} at time ^3. 

However, not all the trip schedules can meet the service 
quality guarantees for each of the individual trip request. We 
formally introduce the concept of a valid trip schedule. 

Definition 2: (Valid Trip Schedule) A valid trip schedule 
S for a trip set TR = {tri, tr2, . . . , trm} satisfies three 
conditions: 

1) Point order For any trip tr^, let xn = Vi, Xi2 = Si, 
and Xis = e^, then, ii < 12 < is, i.e., the requesting 
point must happen before the pickup point, which must 
happen before its ending point; 

2) Waiting time constraint For any trip tri, the dis- 
tance (waiting time) from the server's location when 
the request is made to the request's pickup point 
should be smaller than the waiting time constraint, i.e., 
driri.Si) < w\ 

3) Service constraint For any trip tri, the actual travel 
distance from the pickup point to the dropoff point 
dri^i^ Ci) should be smaller than or equal to the shortest 
distance between them multiplied by the service con- 
straint, i.e., drisi, e^) < (1 + ^)d{si, e^). 

To formally define the real-time ridesharing problem, we 
further introduce the augmented valid trip schedule: Assuming 
at time t, there are m active trips for the given server, let the 
current valid trip schedule be (xi, ^2, • • • , xsm)^ where (t, /) 
is between Xi and x^+i. For a new trip request tr^+i, the 
augmented valid trip schedule shall be {x[^ ^25''' ? ^Sm+a)' 
where Xj = xj for j < i, and x[_^-^ = r^+i. In other 
words, the augmented valid trip schedule combines the new 
request with the existing requests and shares the same partial 
trip schedule before the new request is made at time point 
t. Also any augmented valid trip schedule consists of two 
part: the finished schedule (xi, X2, • • • , x^, r^+i) and the new 
unfinished schedule (r^+i, xj^2 5 ' ' ' 5 ^Sm+s)- 

The problem of Real-Time Ridesharing is: Given a set of 



vehicles on the road network G and a new incoming request 
tr, find the vehicle that minimizes the overall trip cost for the 
augmented valid trip schedule. 

Note that since the finished schedule part in the augmented 
vaUd trip schedule cannot be changed (because it has already 
been executed), we essentially need to find the minimal trip 
cost for the new unfinished schedule part which includes m 
active trips and a new trip request. We also observe that the 
minimal cost is for helping determine the best match between 
the incoming trip request and the available vehicles in a real- 
time fashion. The minimal cost, then, is greedy in nature: 
When additional new requests comes in, the past optimal 
choice matching between the current trip request and the server 
may not be the minimal anymore. However, in the real-time 
request, this type of optimality tends to be the best we can 
achieve as the future requests are not available and can be 
rather easily understood and accepted by riders. 

Finally, we note that the problem of real-time ridesharing 
is NP-hard as the classical Hamiltonian path problem can be 
reduced to this problem (assuming all the trips have the same 
ending points and requested in almost the same time). For 
simplicity, the details of NP-hardness is omitted here. 

B. Challenges 

The main challenge in ridesharing is to determine how to 
handle trip requests as they flow into the system in real-time. 
From a server's point of view, for any new request, each server 
may have already selected (and be executing) a trip schedule 
for its existing customers. Given this, how can we quickly help 
it to determine whether it can accommodate a new request? 
Note that in order to respond to such a request, one may have 
to reshuffle the predefined schedule and the reshuffled one has 
to be a valid schedule. 

Furthermore, there might be tens to even hundreds of servers 
in the region surrounding the pickup point of a new request. 
Clearly, for a trip request tr^, servers that are farther than 
w from the pickup location are unable to respond to the 
request. Thus, we can already reduce the potential candidates 
to only those that are within w of the pickup point. Then, the 
customer will be assigned to the server that offers the shortest 
total trip time. Even though potential servers can be filtered 
through a dynamic spatial indexing structure [9], [10], [11] on 
the moving servers, the existing approaches can still be very 
computationally expensive and result in low response times. 
In a large metropolitan area such as Shanghai, the number of 
requests can be very large, especially during rush hour. 

Most algorithms are designed for offline computation. The 
existing approaches that use branch- and-bound [12] or integer 
programing [8] to schedule new requests do not take the 
dynamic nature of the problem into consideration. Testing 
if a new request can be accommodated essentially involves 
a rescheduling of the unfinished trips and the new request 
without reusing the computations in the previous round. Their 
calculation time was measured in minutes or hours while we 
require milli-second response time. 



C. Contributions 

To deal with the challenges, our idea is based on a simple 
observation. For a new valid schedule accommodating the new 
request tr^, if we simply drop the three points r^, 5^, from 
the trip schedule, then the resulting trip schedule is a valid 
trip schedule. In other words, only a valid trip schedule can 
be extended to accommodate a new request. Given this, a 
potential approach for the ridesharing problem is to simply 
materialize every valid trip schedule; then, when a new request 
arrives, we can check if any valid trip schedules can be ex- 
tended to handle the new request. This approach is promising 
because its incremental nature saves many redundant compu- 
tations: We do not need to recompute the valid trip schedule 
completely from scratch on each new request. However, in 
order to implement such a strategy, we have to deal with 
the following challenges: 1) Would the materialization incur 
too much memory cost? In other words, can we store the 
materialized schedules compactly? 2) How can we efficiently 
maintain the materialization? Note that when the server moves, 
the materialization needs to be updated. 3) How can the 
materialization help to test quickly whether a new request can 
be handled? 4) How can the materalizaton be updated when a 
new requested is accepted? 

This paper makes the following contributions: 

• We formulate the ridesharing problem in a way that 
resembles the scenario enabled by current locating and 
communication technology; We propose a kinetic tree ap- 
proach for the matching problem. The tree structure lends 
itself naturally to the dynamic nature of the problem; 

• When the pickup or dropoff locations are close to each 
other, any permutation of the locations can be valid, 
rendering the constraints ineffective and resulting in a 
large number of valid schedules. We propose a hotspot- 
based algorithm that ignores schedules that are almost 
duplicates to effectively reduce the number of valid 
schedules while providing a bound on the error for the 
solution under certain conditions; 

• We compare our approach to the branch-and-bound and 
mixed integer programing approaches that are tradi- 
tionally used together with the brute-force algorithm. 
Experiments on a large shanghai dataset show that the 
tree algorithm is several times to a magnitude faster in 
response time. We further test tree algorithm on various 
larger problems to show its performance and effectiveness 
of the optimizations proposed. 

D. Outline 

We first describe a branch-and-bound an mixed-integer 
programming algorithm to solve the problem in Section |Il| 
We then propose the kinetic tree approach in Section [IV| In 
section |Vj we deal with the issue of large trees using a hotspot- 
based algorithm. Experiment results are presented in Section 



VI we discuss related work in Section VII and present our 
conclusions in Section Ivnll 
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Fig. 2. Illustration of Branch-and-Bound Algorithm. (1) When request r2 
comes, only {ei, S2, €2} need to be scheduled; (b) Road network distance and 
minimal incident edge cost; (c) When (r2,ei,e2, S2) with cost 6 is found, 
partial schedules with estimated costs above 6 are terminated. 



II. Branch-and-Bound and Mixed Integer 
Programming Algorithms 

The brute-force algorithm to find the augmented vaHd 
trip schedules is straightforward. We enumerate all of the 
permutations and then check the constraints. However, this 
can be expensive. Two traditional approaches that are often 
used in solving the related dial-a-ride problem [6] can increase 
execution speed: a branch-and-bound algorithm [7] and an 
integer programming approach [8]. We first propose a modified 
branch-and-bound algorithm for our problem, and then formu- 
late the problem as a mixed-integer programming problem. 

III. Branch and Bound Algorithm 

The branch-and-bound algorithm systematically enumerates 
all candidate schedules and organizes the candidates into a 
schedule tree. It estimates and maintains a lower bound of each 
partially constructed schedule and stops building candidate 
schedules that have lower bounds greater than the best solution 
found so far. The algorithm first expands the partial candidate 
with the lowest lower bound (best first search). 

Assume at time t, there are m active trips for the 
given server. Let the current valid trip schedule be 
(xi, X2, • • • , X3m), where (t, /) is between Xi and x^+i. 
For a new trip request tr^+i, we need to re-schedule 
the pickup and dropoff points TV = {x^+i, • ' ' ^^Sm, 
r^+i, 5^+1, e^+i}. We treat as a complete graph with 
vertices being N and edge weights being the shortest path 
distances between nodes. We attempt to find the schedule 
through the graph that passes through each node once but, 
unlike a tour, does not return to the first node. The schedule 
also has to begin at the location of the server / (this is also 
r^+i). In Figure |2] (a), when request r2 comes in, si is already 
picked up. So, only N = {61,52,62} needs to be scheduled 
and the schedule must start from r2. 

We start with the initial schedule tree ST =< Vm+i >, and 
initialize the cost of the optimal schedule to 00. We then itera- 
tively perform a best-first- search to expand the partial schedule 
S =< r^+i, x-^^, 5 ' ' ' ^ ^i^^ minimum lower 
bound. The key to a branch-and-bound algorithm is to find 
an effective lower bound. The bound we use is (iT(^m+i7 ^fe) 



plus the sum of the minimum-cost-edge incident to the nodes 
that are not yet in the partial schedule S. 

Figure [2] (b) shows road network costs between two nodes. 
The minimal incident edge cost is labeled beside each node. In 
Figure |2](c), for each node x, the two numbers in a parentheses 
indicate the cost dT{r2^x) of the partial schedule and the 
lower bound of the schedule containing the partial schedule 
as prefix. For (r2,6i), dT{r2-,ei) = 3. Only 62 and 52, both 
with minimal incident edge cost of 1, need to be added to the 
schedule, so the lower bound of a schedule containing (r2, 62) 
is dT{r2,e2) + 1 + 1 = 5. 

We attempt to expand the partial schedule S with minimal 
lower bound by another new node to construct S' . If S' is not 
valid or results in a bound greater than the current minimum 
schedule cost, we terminate S' . If S' is a complete schedule, 
we compare its cost to that of the best schedule and update 
if necessary. Figure |2] (c) shows the execution figure [2] (a) 
. Once the schedule of cost 6 is found, schedules with lower 
bounds above 6 can be pruned (labeled by a gray circle). Note 
that in the figure we do not illustrate validity constraints. The 
complexity of the branch-and-bound algorithm in the worst 
case is still exponential. 



A. Mixed-integer Programming Approach 

Mixed integer programing is a popular alternative. In this 
section, we formulate our augmented valid trip schedule 
problem into a mixed integer programming problem. Then, 
we apply traditional solvers to find the solution. 

As in the branch and bound algorithm, we are rescheduling 
N = {xi+i,Xi+2, • • • ,^3m,^m+i,5m+i,em+i}. The Sched- 
ule must start from r^+i. We divide N into subsets: (1) 
dropoff locations of those already picked up but not dropped 
off; let the size of this set be k\ (2) pickup locations of 
trips not started yet; let the size of this set be n; and (3) 
dropoff locations of trips not started yet; the size of this 
set is also n. The problem can be defined on a complete 
directed graph G = {N, A) where N = D'UPUDU {0}, 
= {1, 2, . . . , /c}, P = {fe + 1, /c + 2, . . . , /c + n}, D = 
{A: + n + l,/c + n + 2, ...,/c + 2n}. Here we assign an integer 
to each point in N while node represents the current position 
//r^+i of the server. For a pickup i in P, its matching dropoff 
in D is i -\- n. A pickup constrain li is associated with a node 
i e P, representing the latest time that node i need to be 
picked up. Each arc (z, j) G A are associated with a shortest 
path routing cost dij. For each arc (i, j), let i/ij = 1 if the 
server travels from node i to node j. For each drop point 
i e D' U D, let Li be the ride time of the request with dropoff 
i e D' U D in this partial route. 
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where Wi is the waiting time left for i G P and is the 
maximal riding time left for i e D\ Here da is set to a positive 
number to make sure ya = 0. 

The objective is to find the schedule that minimizes the total 
cost while satisfying the constraints. Constraint (1) simply 
enforces the binary nature of yij . Constraint (2) allows exactly 
one node preceding another for all nodes but 0. Constraint (3) 
allows exact one node following node 0. These two effectively 
enforce the schedule structure so that each node is visited 
exactly once and the schedule starts from node 0. 

Constraints (4) and (5) set the earliest time at which a node 
can be reached. Constraints (6) define Li for dropoff nodes, the 
service distance. Constraints (7) and (8) enforce the waiting 
time and service constraints for pickup and dropoff nodes 
where the passenger has already been picked up. These are 
grouped together because both Wi and are measured from 
the root node. Constraint (9) enforces the service constraint for 
dropoff nodes where the passenger has not yet been picked up, 
so that the service time does not exceed e^. 

The constraint (5) is not linear. It can be linearized by 
introducing constants Mij using the idea similar to the Miller- 
Tucker-Zemlin subtour elimination constraints for the traveling 
salesman problem [13]: 
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Fig. 3. Kinetic Tree for Trip Schedules. Darkened path: selected schedule 
to be executed; Dark circled/squared nodes: finished nodes. 

of schedules inevitably increase in an exponential fashion. 
We then propose a hotspot-based approach in section [V| that 
reduces the search space and approximates the solution with 
bounds. 
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The validity of these constraints are ensured by setting 
Mij > max{0,/i + dij — ej} where li is the latest time 
that i need to be served and ej is the earliest time that 
j needs to be served. For i e P, [ei^k] = [doi^Wi]. For 
i e D,[ei,li] = [c^o,i-n + di^^i.Wi + c^i_n,i(l + e)]. For 
i e D', [ei.li] = [doi.ri]. 

Let V be the number of variables in the mixed-integer 
programming problem, and c be the number of constraints, 
then V = O(m^) and c = 0{m), where m is the total number 
of requests that we are optimizing. 

IV. Kinetic Tree Approach 

The two approaches above both suffer from one fundamen- 
tal problem: they essentially reschedule the unfinished pickups 
and dropoffs with the new request from scratch without re- 
using the computations performed before. The structure of 
the two algorithms make it difficult to adapt to the dynamic 
nature of the problem. In this section, we introduce a kinetic 
tree structure that can maintain and update the calculation 
performed up-to-now and use them effectively when a new 
request is issued. However, when there are multiple pickup 
or dropoff locations close to each other, the possible number 



We introduce a kinetic tree structure to maintain all the valid 
trip schedules with respect to the server's current location. 
When the server moves, a portion of the schedule becomes 
obsolete. The root of the tree tracks the current location / of 
the server. The rest of the tree records the portion of all the 
valid schedules (from the current location onwards). 

For a given w and e. Figure [3] illustrates the kinetic tree 
structure corresponding to the complete trip schedule in Fig- 
ure [T] The darkened path represents the selected schedule to 
be executed by the server. Initially, for the first trip request, 
there is only one valid trip schedule (Figure |3] (a)). When 
the second request arrives, the first customer has already been 
picked up by the server. Now, consider there are only two 
valid options for the server to accept the new request: it needs 
to first pick up the second customer, but it can be flexible in 
dropping off either of the two passengers. Let us assume it 
decides to choose the shorter one which is S2, ei, 62), to 
drop off the first customer first (assuming the option which 
drops the first passenger before picking up the second one is 
invalid). However, on its way to pick up the second customer, 
the third request arrives. The server now has the options to 
either pick up the second customer or the third one. Suppose, 



consequently, based on w and e, that there are five possible 
valid trip schedule for the server to handle the three trip 
requests (trip one is already in progress, shown in Figure [TJc)). 
Assuming the server decides to move along the shortest route 
(/, 52, 53, 63, 63, ei) for now and picks up the second customer 
first, then when the fourth request arrives after the pickup 
of the second customer, the entire right subtree of rs in 
Figure [TJc) becomes inactive. Let us now assume there are 
only two possible schedules to accommodate the remaining 
trips of all four customers as shown in Figure [ijd). 

Why is such a kinetic tree useful in maintaining the valid 
trip schedules? Its advantage is based on the the following key 
observation: 

Lemma 1 (Valid Schedules under Movement): When a 
server reaches a new pickup location or dropoff location 
in the trip schedule, then only those valid schedules which 
contain unfinished trips and share the same prefix so far (from 
the first pickup point of all the unfinished schedules to the 
current location in the trip schedule) need to be materialized. 
All the other schedules will become inactive and can be 
pruned from the tree. 

For example, in Figure |3jc), once the server actually picked 
up the second customer, only the schedules in the left subtree 
rooted with S2 remain active. There are two options to perform 
the tree pruning. The eager invalidation option tries to deter- 
mine whether some trip schedules become inactive as early 
as possible. In other words, it tries to perform the pruning 
as the server moves or when the next point in the scheduled 
route is reached. The lazy invalidation option only performs 
such pruning when necessary, i.e., only when there is a new 
incoming request. 

B. Handling a New Request 

Now, we consider how to handle a new request 
tvk = (r/c, e/e). The assumption is that we already 
have a materialized prefix tree of all valid and active 
schedules of unfinished trips. Now, we need to extend all 
valid and active schedules in the prefix tree to a new valid 
schedule to include tr^ if possible. We do this by generating 
a new prefix tree based on the existing one. To deal with 
the new request, we will first deal with the pickup location 
Sk and then the dropoff location e/^. Essentially, we need to 
scan the tree to determine where Sk can be inserted, i.e., 
which edges of the tree can accommodate the insertion of a 
new pickup node. All schedules that share the prefix from 
the root of the tree to the inserted edge will be inserted into 
the new tree. Then we insert after in the new tree. 
Furthermore, if Sk or Ck can be inserted at a given location 
(an edge in the tree), then we have to find out which trip 
schedules containing that edge with an additional node will 
become invalid and should be pruned from the new tree. 
The problem is how to determine 1) at which edge Sk or Ck 
can be inserted, and 2) how to quickly prune the invalid trip 
schedules following that insertion. 



Inserting Pickup Location: Here, we focus on whether Sk can 
be inserted first and Ck can be inserted in a similar way later. 
In order to insert Sk in a tree edge, say (x^, x^+i), we need to 
deal with the following situations: (a) only when the distance 
from the current location (recorded in the root node /) to the 
pickup location si satisfies dT{l-, si) = d{l^xi) + d{xi^X2) + 

\-d{xi^ Sk) < w, then Sk may be inserted; (b) the additional 

travel distance (time) introduced by the detour to Sk may inval- 
idate some existing trip schedule in the subtree containing this 
tree edge {xi.Xi^i), i.e., d{xi, st) ^ d{sk, Xi^i) - d{xi, Xi^i) 
should not be too large. These schedules should be pruned 
from the subtree. Note that condition (a) is easy to be tested 
in the existing tree structure. 

Lemma 2: (dril^Sk) < w) The shortest distance from the 
current location to the requested pickup location 5/^ is no larger 
than w. Furthermore, given a prefix (partial) trip schedule 
from the root node / to a node Xj, i.e., (/,Xi,X2,--- 
if dril^ Sk) = d{l, xi) -\-d{xi,X2) + • • '-\-d{xj^ Sk) > w, then, 
any edge incident to any descendant of Xj in the tree cannot 
accommodate Sk, i.e., customer can not wait for server until 
it finishes xj to pick him up at Sk. 

This lemma suggests that we can perform either a depth 
first search (DFS) or breadth first search (BFS) starting from 
the root node of the tree to generate all the candidate edge 
(xi^Xi-^i) to insert Sk. Specifically, during the traversal the 
visiting will return once certain depth is reached, i.e., a node 
has the property that dril^xj) > w, then, we either will not 
expand that nodes (in BFS) or trace back (in DFS). 

Now the key problem is how to handle condition (b). 
The straightforward way to perform pruning is to explicitly 
maintain and check the constraints for each trip request in 
the subtree of the node Xi. Specifically, for a trip trj in the 
subtree rooted at Xi, there are two criteria: pickup waiting 
constraint [rj^Sj^w] (drirj^Sj) < w) and trip tolerance 
constraint [sj^Cj^e] (drisj^ej) < {1 -\- e)d{sj^ej)). At any 
given time point t, clearly if we need to test whether the 
detour meets the criteria of trip trj, then the request is 
already issued and responded, and the entire trip is not yet 
completed. Further more, only one of the criterion needs to 
be tested: if the server has not picked up the customer, then, 
we need to test the pickup waiting constraint [rj , Sj , w] ; once 
the customer is picked up, we need test the trip tolerance 
constraint [sj , ej , e] . Thus, at any given point, the "active" 
customers can be partitioned into two sets: Si records those 
customers who need to be picked up and ^2 records the 
on-board customers who need to be dropped off. When a 
new location is reached, we may move customers from Si to 
^2 and/or remove customers from 6*2 . For trip j in Si, we 
test the first criterion [rj , Sj , w] and in 5*2 , we test the second 
one: [sj^ej^e]. Given this, for the subtree rooted at Xi, the 
straightforward way is to first generate these two sets and 
5*2. Then, when we insert Sk, we need test each condition 
associated with Si and ^2 are also satisfied. 

Algorithm [T] implements the insertion of a new request 
trk = (s/c,e/e) into the tree recursively. The insertion is 



Algorithm 1 insertNodes algorithm. 

Parameter: root node /, request points P = (xi,X2,...)» current 
depth depth 

if feasible{l, xi, depth + d{l, xi)) then 
Initiahze fail = 

n = create{l^xi) {Copy feasible child branches underneath 

n] 

for each c such that edge (/, c) exists do 

copyNodes{n, {c}, d{l, n) + c) — c)) 

If copy failed, set fail = 1 
end for{Insert remaining request points to n} 
if fail = and \P\ > 1 then {Detour now begins negative 
because we haven't inserted X2 yet} 

insert N odes{n^ {x2, ...}, — X2)) 

If insert failed, set fail = 1 
end if{Now insert request points into children} 
for each c such that edge (/, c) exists do 

insertNodes{c, P, detour + d{l, c)) 

If insert failed, delete (/, c) 
end for 

if fail — then 

Add edge (/, n) 
else if No nodes c with edge (/, c) exist then 

Insert failed, notify caller that this subtree is infeasible 
else 

Insert succeeded 
end if 
else 

Insert failed, notify caller that this subtree is infeasible 
end if 



completed by a call, insertN odes{root^ {s/c, e^}, 0). The call 
to feasible{parent,node, detour) returns whether or not it 
is feasible to insert node as a child under parent in the 
tree. First, this ensures that the pickup or service constraint 
of node is not violated. If min-max filtering is in place (will 
be discussed in next section), this will confirm that the detour 
(third argument) is tolerable for node. 

The copy N ode s {node ^ source ^detour) function recur- 
sively copies nodes from a set of nodes, source, to the 
target node, node. Here, tolerance of the root's children in 
insertNodes is implemented through calls to feasible with 
detour of detour, copy Nodes will fail if all of the children of 
node are along infeasible paths. In this case, these branches 
and node will be deleted. 

In Figure]?] (a), we use the insertion algorithm to insert the 
pickup location S3 into an existing tree, thereby generating 
a new tree. 53 will first be inserted directly below r. Then, 
the branch with root at S2 will be copied underneath this new 
ss node, forming a new tree of S3, S2, ((ei, 62), (e2, ei))) is 
generated. Let us assume route 53, 52, ei, 62) is not feasible; 
then, the branch is pruned from the tree starting at the leaf 
node until we reach 52, where we have an alternate path 
/, 53, S2, 62, ei) that is feasible. This pruning occurs in the 
copyNodes algorithm, which will succeed because 53 falls 
along at least one (in this case, exactly one) feasible path. 
The resulting tree is shown in Figure |4] (b). 

Then, the insertion algorithm moves down to S2 and 
attempts to insert the pickup location after it. Two paths are 
formed: S2, S3, ei, 62) and (r, 52, 53, 62, ei), as a result of 
the insertion between 52 and 62 and between S2 and ei. The 
resulting tree is shown in Figure [4] (c). Suppose inserting 53 




(e) (d) 

Fig. 4. Tree Insertion. The insertions of S3 into each edge in tree of (a) 
result in new trees in (b), (c), (d), and (e), assuming the last two insertions 
were infeasible. 

between ei and 62 or between 62 and ei is infeasible. Then, 
we have the tree in Figure [4] (d). To complete the insertion 
of the (53,63), we now try to insert 63 in the subtrees that 
root at a 53 following the same insertion algorithm. Once this 
completes, we arrive at the tree shown in Figure |4](e). 

Min-max Filtering using Slack time: Though the above 
test for condition (b) is conceptually simple, it is rather 
computationally expensive. Now, we introduce a fast approach 
to simplify and speedup such test. For any node j, if j is 

in Si, let dj = w — dT{rh^Sj)\ otherwise (j is in S2), let 
(5j = (1 + e)d{sj,ej) — drisj^ej). Then, for the node Xj, we 
associate slack time A^. = min((^^-, maxi^x.-.c/iiZdren A^). 

Note that A^^^ essentially represents the minimal allowed 
detour on the most "lenient" route of the subtree routed at 
Xj. Here "lenient" means the route can tolerate the most 
detour compared to other routes. Given this, we introduce 
the following Theorem to describe the simple condition to 
determine whether Sk can be inserted at a given edge. 

Theorem 1: For a trip request trk, if edge (xi,Xi+i) 
does not satisfy either of the following condition: (a) 
drij-, Sk) = d{l., xi) + (i(xi, X2) + • • • + d{xi., Sk) < w; or (b) 
d{xi,Sk) ^ d{sk,Xi^i) - d{xi,Xi^i) < A(^.^a,.^_^), then, we 
can not add the pickup s^ between location Xi and x^+i. 

After insertion in (x^,x^+i), the all nodes under 
Xi of the new tree will be tested for the constraint 

Si > d{xi,Sk) + d{sk,Xi^i) - d{xi,Xi^i) . A branch is 



pruned from the subtree if the constraint is not satisfied. 

Updating A and Tree: After we try to insert a request to 
all possible servers, we get a set of new trees. For each 
tree, we can find the shortest route and choose the tree what 
provides the shortest route among all trees. Only the chosen 
tree needs to have its A updated. This can be done through 
one tree traversal. When a server is moving, the tree needs to 
be updated as well. However, the A values are quiescent to 
server movement and do not need to be updated. The tree is 
updated as: 

• Vehicles follow their routes and update the server when a 
new pickup or dropoff location is reached; Server drops 
the inactive portion of the tree accordingly; 

• Many moving object indexing methods have been pro- 
posed that includes RUM-tree, TRP-tree, Bx-tree, Bdual- 
tree, and STRIPES. Indexing can substantially decrease 
the searching of the candidate taxis. However, a trade 
off needs to be made between maintaining a complex 
and search-efficient index and relying on a search- 
approximate but easy to maintain index. In our dataset, 
around 1,7000 taxis update their locations every 20 to 
60 seconds. We choose to use a simple grid-based spatial 
index. The index is updated when a vehicle moves across 
boundaries of the index bounding box. For each request, 
it identifies the vehicles possibly within w of the request, 
asks the vehicle's actual location, and then tests if these 
vehicles can accommodate the request. 

V. HOTSPOT Based Optimization 

The main problem with the basic tree algorithm is the 
exponential explosion of the size of the tree when there are 
multiple pickup or dropoff locations close to each other. For 
example, if we have 8 pickups occur in spatial proximity 
around similar time, e.g airport terminals, any permutation of 
the pickups may result in a valid schedule. So there are 8!= 
40,320 possibilities already without considering the dropoff 
points. We propose an approximation approach with bound 
to reduce the search space. The idea is that when the time 
and space requirement of computing the best schedule are 
too much, a server may decide to shed the load by only 
maintaining a subset of the schedules. Since the number of 
leaves of the kinetic tree is determined by the number of 
possible routes, the tree size is effectively controlled by the 
approximation and the service contraints. 

We propose the following hotspot clustering algorithm to 
deal with this situation. When we insert a pickup point to 
an edge we check if d{xx-\-i^Sk) < where 6 is 

a small number. If so, Sk is inserted into the node of x^+i. 
Sk and Xi-^i are treated as one point called hot spot in the 
tree and an arbitrary schedule is chosen among the points in 
a hot spot. When the hot spot contains more than one point, 
the newly inserted point needs to be within 6 to all the points 
of the hot spot. Similar procedure can be done for the dropoff 
points and the mixture of pickup and dropoffs. Once the point 
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Fig. 5. Bound for Hotspot. Xi, xj, and x^ are in one hotspot. Black 
lines: optimal schedule St,est- We can convert Sbest by connecting Xi,Xj, Xk 
consecutively first and then thread the other locations (represented by ovals). 
The new schedule has a bounded cost. 

is combined with any node, we stop trying to insert it to any 
other edges. 

Let us first assume the service constraints are sufficiently 
large that all schedules are possible. For a trip set TR, let S^est 
be the optimal schedule. Suppose there is a hotspot hp among 
the pickup and dropoff locations of TR. Our hotspot-based 
method chooses an arbitrary schedule T^s that goes through 
the points of the hotspot in a consecutive manner. We want to 
prove that the cost of T^s is bounded. 

Theorem 2: cost{Ths) < cost{Sbest) + 2(m + 1) x 6 where 
m is the number of points in the hotspot without considering 
constraints. 

Proof Sketch: We prove when m = 3 by illustration. 
For general m, the proof is mainly the same. In Figure [5] 
(a), assume {xi^xj^Xk} has pairwise distance of no greater 
than 0. The optimal schedule Sbest is labeled by black 
solid and dashed lines. We can convert S^est into T^s by 
connecting Xi^Xj^x^ consecutively first and then thread the 
other locations (represented by ovals) in Sbest as shown by 
the red lines and black dashed lines. We prove that (1) 
cost{Ths) < cost{Si)est) + 36> which is equivalent to prove 
a' + 6' + c' + c^' + e' <a + 6 + c + d + e + 3x6> since the 
dashed lines are common in both schedules. 

We know d' < b -\- c, e! < d -\- e. Now we only need to 
show -\- V -\- < a -\- 3 X 0. As shown in [S] (b), we can 
easily prove that < a-\-0 because the shortest path between 
Xk and Xi^i is no longer than than the schedule 
Because h' <0 and d < 0, we know a' + 6' + c' < a + 3 x 6>. 

However, the hotspot algorithm may not use the same 
order of Xi^Xj^Xk as in the optimal solution as it is an 
arbitrary order, we now approve that (2) for any two hotspot- 
based schedule Shs and Shs', cost{Shs) < cost{Shs') + 
(m + 1)0 where m = 3. Without loss of general- 
ity, let Shs = . . . ,Xi_i,Xi,x^-,x/e,x/e+i . . . and Shs' = 
. . . , Xj, x/c, x/c+i . . .. It is obvious that d{xi-i^Xi) 
< d{xi-i^Xj) -\- 6 and d{xk^Xk-\-i) < d{xi^Xk+i) -\-0. Also 
d{xi,Xj) < d{xj,Xk) + and d{xj,Xk) < d{xk,Xi) + 
6. Adding the inequalities together, we have cost{Shs) < 
cost{Shs') ^ "^0 

Putting (1) and (2) together, we have cost {Shs) ^ 
cost{Sbest) + (2m + 1) X ^ where m = 3. □ 

Because after we build the whole tree, we select the shortest 
schedule with hotspot cost{ShsBest) and it is obvious that 

C0St{ShsBest) < C0St{Sbest) + 2(m + 1) X 6>. 

When we consider the constraints, for S^est the corre- 



spending hotspot-based schedule with constraint may violate 
some constraints and thus does not exist. However, when the 
constraints of points of the best schedule are relaxed, the 
corresponding hotspot-based schedule will be found. We have 
the following theorem. 

Theorem 3: cost{Shs) ^ cost{Sbest) +2(m + 1) x ^ where 
m is the number of points in the hotspot when constraints of 
all points in Sbest is larger than mO. 

Proof Sketch: Again we prove for m = 3 because of the 
ease of illustration. In Figure [S] (a), if {a^p^b^c^q^d^e) is a 
valid partial schedule with each node having at least 3^ slack 
time, then {a\ b\ c\p^ d\ e') is a valid partial schedule. 

For any point on p, the extra delay a' ^h' — a < 3x6. 
For any point on q, the extra delay -\- -\- d — a -\- — 
(6 + c) < a' -\- b' -\- d < 3 X 6. For x^+i, the extra delay is 
a'-\-b'-\-c'-\-d'-\-p-\-q — {a-\-b-\-c-\-d-\-p-\-q) which is 
proven in Theorem [2] as no larger than 3 x ^. □ 

When is sufficiently small, we will likely to find a 
schedule that is upper bounded by the best schedule with a 
small additional time. 

VI. Experimental Design 

Data for the experiments is based on trips of 17,000 Shang- 
hai taxis for one day (May 29, 2009); the dataset contains 
432,327 trips. Each trip t includes the starting and destination 
coordinates t.s and t.e and the start time t.time (i.e., the time 
at which the taxi picked up the passengers for the trip). 

A simulation framework submits trip requests to the system 
in real-time based on the trips. Specifically, for each trip t, 
and trip request is initialized as tr =< Si = t.s^Ci = t.e >, 
and tr is submitted at time t.time in the simulator. 

An instance of the route algorithm is associated with a 
vehicle. When a new trip request is received, the simulator 
trips the request with each vehicle and then chooses the vehicle 
returning the minimum time; the request is then assigned to 
that vehicle. 

The simulation framework executes based on an undirected, 
weighted graph derived from and representing the Shang- 
hai road network. This graph contains 122,319 vertices and 
188,426 edges. The starting and destination trip coordinates 
are pre-mapped to the closest vertex in the graph. This may 
result in some inaccuracy if the coordinates are in the middle 
of a street or if the coordinates do not match with the road 
network data, but this inaccuracy is negligible. A vehicle is 
initialized to a random vertex in the city, and then follows a 
given route when there are customer(s) on board or, otherwise, 
follows the current road segment (at intersections, the next 
segment to follow is chosen randomly). We assume that 
speed in the road network is a constant 14 meters/second 
(approximately 48 kilometers/hour). Then, most computations 
are done in terms of distance instead of time (such as finding 
the path with shortest duration). 

The framework is implemented in C++. We run the experi- 
ments on cluster nodes with an Intel Xeon X5550 (2.67GHz) 
processor. The simluation implementation is single-threaded. 



so only one core of the CPU is used. We limit the memory 
usage of each simulation process to three gigabytes. 

To simulate the taxis, we need both the distance and 
routes (vehicles follow) between vertices in the road network. 
Computing shortest path on road networks has been widely 
studied (see [?] for an extensive review). A variety of tech- 
niques [?], such as A*, Arc-flag (directing the search towards 
the goal), highway hierarchies (building shortcuts to reduce 
search space, transit node routing (using a small set of vertices 
to relay the shortest path computation), and utilizing spatial 
data structures to aggressively compress the distance matrix, 
have been developed. Recently, Abraham et al. [?] recently 
discovered that several of the fastest distance computation 
algorithms need the underlying graphs to have small highway 
dimension. Furthermore, they demonstrate the method with 
the best time bounds is actually a labeling algorithm [?]. 
We implement the state-of-art hub-labeling algorithm - a fast 
and practical algorithm to heuristically construct the distance 
labeling on large road networks, where each vertex records 
a set of intermediate vertices (and their distance to them) 
for the shortest path computation [?]. For the purposes of 
tracking taxi location, a second version of the road network 
is stored in memory in a weighted adjacency list structure 
without additional information. 

However for large scale ridesharing, the shortest path algo- 
rithm is called very frequently and can be the bottleneck if 
not implemented efficiently. We observe the repeated calling 
follows a pattern that preserves locality. So, we implement 
two LRU caches using a single hash table, one storing up 
to ten million shortest distances and the other storing up to 
ten thousand shortest paths (separate caches are used because 
more distances can be stored in memory, and shortest distance 
is needed more often than shortest path). Both caches are in- 
dexed only by the starting and destination points in a distance 
or path computation call; this is accomplished by defining the 
index for two vertices s and e as i = id{s) • \ V\ +id(e), where 
id returns an integer representation for a vertex. 



Parameter 


Tested settings 


Capacity 


4 


Constraints 


5 min / 10%; 10 min / 20%; 
15 min / 30%; 20 min / 40%; 25 min / 50% 


Number of servers 


1,000; 2,000; 
5,000; 10,000; 20,000 



TABLE I 

Parameters for four- algorithm comparison. 



A. Four Algorithm Comparison 

We first compare kinetic tree algorithm with the branch 
and bound, brute-force, and mixed-integer programming al- 
gorithms under the dataset of 432,327 trips. 

We choose three important parameters: capacity, waiting 
time and tolerance constraints, and number of taxis/servers. 
We first establish reasonable defaults for the parameters, and 
then proceed to modify the parameters one at a time to evaluate 
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Fig. 6. Results for changing number of customer requests (a), constraints (b), and number of servers (c). Default parameters are 10 min / 20% for the 
constraints, 10,000 servers, and a capacity of 4. 



their effect. The defaults (bolded) and other tested settings are 
shown in Table |T| Note that a waiting time constraint of 10 
minutes corresponds to 8,500 meters. 

To evaluate performance, we measure the average customer 
response time (ACRT), the average time required to complete 
the search for the minimum time needed to satisfy a new 
request. We further measure the average response time (ART), 
the average time needed to calculate the best route for a taxi 
to follow given its current state, for different request sizes. 
Depending on the number of requests need to be scheduled, the 
ART can change significantly (for example, a taxi with twenty 
current requests would have forty more points to schedule 
than one with no assigned requests). Thus, we calculate ART 
separately for different current request sizes and then compare 
to see the effect of the number of current requests on response 
time. 

The default taxi capacity is set at four both to mimic 
real-world situations and because a higher capacity results 
in other algorithms not being able to finish executing within 
a reasonable time. Additionally, rather than testing lower 
capacities, we use ART to find the effect of different problem 
sizes on the efficiency of the algorithms. 

Figure [6] shows the four-algorithm comparison. Figure |6] (a) 
shows the ART with different numbers of requests. Figures 
[6] (b) and (c) show the ACRT for varying constraints and 
fleet size, respectively. Generally, the brute-force and branch 
and bound algorithms exhibit roughly the same performance. 
The mixed-integer programming approach takes significantly 
more time, probably because of significant execution time used 
to initialize and preprocess each mixed-integer programming 
problem. The tree algorithm outperforms the other algorithms 
for all test cases, due to its incremental approach. 

For a small number of taxis and a large number of customers 
already scheduled to the taxi, branch and bound outperforms 
brute-force. The reason is most likely that the pruning effect 
of branch and bound is more important when the shortest 
route calculations have more customer requests, because the 
problem size is larger. When the problem size is small, the 
fast initialization of the brute-force algorithm is preferable; 



branch and bound, on the other hand, has to first calculate the 
minimum edges for each of the vertices in the complete graph 
of pickup and dropoff points that it uses. 

For the default parameters, the execution time of the branch- 
and-bound and brute-force algorithms are almost identical, 
while the mixed-integer programming is approximately 20 
times slower. The tree algorithm, on the other hand, is almost 
two times faster than the branch- and-bound algorithm. Sim- 
ilar magnitude execution time differences are seen for other 
parameters. 
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Fig. 8. ART for four customer requests when changing constraints (a) and 
number of servers (b). Default parameters are 10 min / 20% for the constraints, 
10,000 servers, and a capacity of 4. 

In Figure [8] we show ART when shortest routes are be- 
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Fig. 7. Results with tree algorithms for different numbers of customer requests (a), changing constraints (b), and changing number of servers (c). Default 
parameters are 10 min / 20% for the constraints, 2,000 servers, and a capacity of 6 ((a) shows ART in different cases for these parameters). 



ing calculated for four customer requests to get an idea of 
performance of larger sizes. These are graphed as constraints 
or number of servers increase. For constraints, the ART 
gradually increases as the constraints become looser. This 
makes sense because more feasible combinations that have 
to be considered. Additionally, the brute-force appears to be 
less influenced by this trend, probably because the brute-force 
anyway enumerates each permutation(the constraints still does 
effect its ART because it can stop earlier on average when 
checking the feasibility of each permutation). 

The increasing number of servers appears to have little 
effect on the tree algorithm, but for the other three algorithms, 
the ART clearly decreases. This is most likely because with 
more servers, most of the cases where there are four passengers 
occur when the pickup and dropoff points of the passengers are 
not as clustered: there will often be an empty server close to 
a server with several passengers, and it would get the next 
request. Because the pickup and dropoff points are farther 
apart, there are less combinations and the execution time is 
lower. On the other hand, when there are few servers, the 
servers are spread farther apart, meaning that it is more likely 
for a single server to handle several clustered requests. 

B. Comparing Tree Algorithms 

We further evaluate different versions of our tree algorithm: 
basic tree algorithm, the slack time algorithm, and the hot-spot 
clustering algorithm (which also uses slack time). 



Parameter 


Tested settings 


Capacity 


3; 4; 5; 6; 7; 8; 12; 16; unlimited 


Number of servers 


500; 1000; 2000; 5,000; 10,000 


Constraints 


5 min / 10%; 10 min / 20%; 
15 min / 30%; 20 min / 40%; 25 min / 50% 



TABLE II 

Parameters for Tree algorithm Comparison. 



We set a default capacity of six because we find that the 
tree algorithms are able to solve larger problems than the other 
approaches. The incremental nature of the tree algorithms and 
improvements in the hot-spot clustering algorithm allows us 
to explore the effect of an unlimited capacity (for most of the 
other algorithms, we find that this leads to too great a search 



space), which gives an idea of what the maximum achievable 
ridesharing is. The parameters we use for evaluating the tree 
algorithms are shown in Table [llj with the bolded values being 
the default settings. 

We now evaluate the performance of the slack time and 
hot-spot clustering improvements that we make to the tree 
algorithm. Figure |7] shows these results. The slack-time algo- 
rithm is faster than the basic tree algorithm except when the 
number of servers is 10,000, the number of customer requests 
is 6, or the constraints are at 20 min / 40%; these cases are 
examined more closely in Figure [9] Slack-time achieves a 
maximum time saving of over 32% compared to the basic tree 
algorithm when the constraints are at the tighest level tested, 
5 min / 10%. For the default parameters, it yields savings of 
approximately 18%. So, the slack-time does yield a relatively 
significant improvement in time, especially when constraints 
are tight so that many branches in the tree are infeasible. 

Figure |9] presents results similar to those in Figure [8] but 
for the tree algorithms. Most prominent in the graphs is the 
steep increase in ART for tight constraints and large capacities 
with the basic and slack- time tree algorithms; this is opposite 
to the results in Figure [8] It can be explained, however, by 
the increased capacity that we use. In both cases where the 
ART is large, it is relatively rare for a server to have six 
passengers: typically, there would either be another server 
with less passengers available to handle the request or the 
constraints would be too tight to allow so many passengers. So, 
when the server is able to get six passengers, it is most likely 
because the pickup/dropff points are very close to each other. 
In these cases, the short distance between the points creates a 
large number of feasible combinations. Although these cases 
would also appear for looser constraints and smaller numbers 
of servers, the ART is an average, and other six-passenger- 
cases that do not create a large number of combinations would 
be much more common. This also explains why the hot-spot 
clustering algorithm is not affected by the trend. The slack- 
time tree algorithm is faster than the basic tree algorithm 
because slack time only reduces execution time when there 
are many infeasible branches that can be pruned. 

Additionally, like in Figure [8j the ART in Figure [9] increases 
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Fig. 9. ART for six customer requests when changing constraints in (a) and number of servers in (b). In (c), tree-algorithm ACRT for different capacities, 
"unlim" indicates unhmited capacity. Only hotspot clustering algorithm can complete for unlimited capacity. Default parameters are and a capacity of 6, 10 
min / 20% for the constraints and 2,000 servers. 



gradually with looser constraints starting from the 10 min / 
20% parameters because looser constraints also allow more 
feasible combinations. 

Figure [9] (c) shows the ACRT results for different capacities. 
The ACRT breaks off for each algorithm when it can no longer 
finish in a reasonable time or exceeds the imposed memory 
limit of three gigabytes. The hot-spot clustering algorithm is 
the only one that is able to finish the simulation program with 
a capacity greater than seven, and also for unlimited capacity 
(marked as unlim in the figure). 

From this figure, we can see that while the basic and slack- 
time tree algorithms are unable to continue processing when 
the problem sizes become too large, the hot-spot clustering 
algorithm is scalable to higher capacities. This also confirms 
our hypothesis that the biggest issue for unlimited capacity is 
situations where a large number of passengers wish to depart 
from a single point; hot-spot clustering combines these points 
in the tree. 

The maximum number of passengers at unlimited capacity 
in a single server is 17, while the average is 1.7 (this is with the 
default parameters, so the number of servers is two thousand). 
The average in the top 20% filled servers is a bit higher than 
3.9. This indicates that the majority of vehicles in a server fleet 
should be five-person cars (with one of the five seats taken by 
the driver), but for some requests larger vehicles are needed. 

VII. Related Work 

Our work is related to nearest neighbor (NN) search on 
moving objects over road networks. Early work on nearest 
neighbor search on road networks focuses on data models 
that are easy to implement and serve as a foundation for NN 
queries [14]. Later research has focused on continuous moni- 
toring of nearest neighbors (NN) in highly dynamic scenarios, 
where the queries and the data objects move frequently on a 
road network [15]. A recent paper addresses the problem of 
monitoring the k nearest neighbors to a dynamically changing 
path in road networks. Given a destination where a user is 
going to, this new query returns the k-NN with respect to 
the shortest path connecting the destination and the users 
current location [16]. Guting et. ah proposed algorithms to 
find the k nearest neighbors to ruq within D for any instant 



of time within the lifetime of rUq given a set of moving 
object trajectories D and a query trajectory rriq [17]. Nearest 
neighbor query on road network is only the minor step in 
the ridesharing system that can help to filter the initial set of 
candidate taxis. 

The trip grouping algorithm [5] groups "closeby" cab 
requests using a set of heuristics. Requests are queued for a 
user given waiting time to be scheduled. The heuristics include 
grouping requests upon expiration, estimation combination 
saving using pairwise request combination gain, and greedy 
grouping. The trip grouping algorithm is then expressed as a 
continuous stream query and optimized by space partitioning 
and parallelization. This method is heuristic-based and does 
not provide waiting and riding time service guarantee as our 
method does. 

In operation research, early research on this problem mostly 
focuses on a single vehicle and a static scenario where the 
set of requests are known ahead of time. This is unrealistic 
for large scale and ad-hoc services such as a taxi service. The 
problem is, unsurprisingly, NP-hard. Only problems with small 
sizes can be solved to optimality. Exact dynamic programming 
algorithms have been developed [18]. Note that the problem 
without a deadline can be considered as the special case of the 
problem with a deadline where the deadline is infinite. Once 
the fixed deadline is given, we can construct subproblems 
using these deadlines, and thus dynamic programming can be 
employed. However, in our case, since the maximal waiting 
time and the service level are two separate constraints, each 
trip request can be enforced with a fixed completion dead- 
line. Thus, the dynamic programming approaches can not be 
applied to our problem. We also note that our problem can 
be considered more general than the fixed deadline problem. 
Given a fixed deadline t, the maximal waiting time can be 
defined as = t — (1 + e)d{s^e). Thus, our algorithm can 
also be used for the fixed deadline problem. 

In a dynamic single vehicle DARP problem, requests come 
in real time and a server has to make decisions on-line 
[6]. In the problems without deadline, the objectives are to 
minimize makespan (time to finish the last request) or the 
average completion time. Competitive ratio is a standard tool 
to measure the effectiveness of a dynamic DARP algorithm. 



An on-line algorithm A is called c — competitive if for 
any instance S, the cost of A on ^ is at most c times the 
offline optimum on S. This is assuming an optimal solution 
is available which is false for modern large scale scheduling 
problem we are addressing . 

This paper deals with the multiple servers, dynamic (i.e. 
real-time) DARP with deadlines. When deadline is involved, 
the objectives are three folds: (1) real-time response; (2) 
minimize average completion time; (3) maximize requests 
served. The most related work is the two-phase insertion 
technique [19]. 

The single vehicle problems are typically solved to op- 
timality by a branch-and-bound algorithm which may incur 
exponential time complexity [7]. The state-of-the-art Branch- 
and-cut (BaC) algorithm [8] formulates the multiple server 
version of this problem using mixed-integer programing and 
a branch-and-cut solution. BaC can find exact solutions for 
small to medium size instances (4 vehicle and 32 requests on 
a moderate PC for tens to hundreds of minutes). It assumes all 
vehicles and requests are available ahead of time which is not 
realistic for a dynamic taxi service of thousands of vehicles 
serving through out the day. Nevertheless, the solution can 
be adopted to accommodate the attempts of combining new 
requests with existing routes of vehicles. We compare our 
kinetic tree based approach to a branch-and-bound approach 
and a mixed integer programing approach in this paper. 

VIII. Conclusion 

In this paper, we formulate and propose a kinetic tree algo- 
rithm with optimizations to dynamically match real-time trip 
requests to servers in a road network to allow ridesharing. The 
proposed algorithm outperforms commonly used approaches 
including branch and bound and mixed-integer programing, 
as shown by expriments on a large taxi dataset. In the future, 
we would like to consider uncertainty issues in scheduling; this 
is very important and may be a major road block in achieving 
large scale ridesharing. 
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